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ABSTRACT 


A  major  problem  in  developing  a  job  evaluation  plan  is  the  estimation  of  individual 
rater  consistency  and  degree  of  interrater  agreement.  A  method  for  making  these  estima> 
tions  is  proposed  which  combines  a  multiple  regression  model  with  a  mathematical  group¬ 
ing  model  in  quantifying  a  measure  of  predictive  efficiency.  Officers  ranked  50  simulated 
Air  Force  specialties,  each  of  which  consisted  of  pre-asstgned  scale  values  for  10  job 
requirement  factors,  38  officers  ranked  the  jobs  on  the  basis  of  merited  grade,  36  on 
merited  pay.  Each  rater^s  consistency  was  evaluated  by  a  multiple  regression  equation 
pit-dieting  his  rank-ordering  of  the  jobs  from  the  factor  values.  Consistency  of  policy 
among  raters  was  measured  by  the  loss  in  predictive  efficiency  when  a  single  equation 
represented  the  joint  policy  of  the  group.  Measures  of  rater  consistency  showed  that  all 
but  2  of  the  raters  were  adequately  consistent.  Measures  of  interrater  agreement  indi¬ 
cated  that  raters  were  applying  a  homogeneous  policy,  whether  they  ranked  on  merited 
pay  or  merited  grade.  The  officer  raters  (captains  and  majors)  were  capable  of  applying 
a  consistent  policy  in  evaluating  jobs  when  their  only  information  was  an  estimate  of 
the  job  requirements. 
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AN  APPLICATION  TO  JOB  EVALUATION  OF  A  POLICY-CAPTURING  MODEL 
FOR  ANALYZING  INDIVIDUAL  AND  GROUP  JUDGMENT 


Ward  (1961)  introduced  a  concept  of  grouping  or  clustering  in  an  iterative  fashion  which 
enables  the  investigator  to  specify  the  cost  of  grouping  at  each  iteration  in  the  procedure.  The 
cost  is  expressed  in  terms  of  a  function  defined  by  the  investigator.  Bottenberg  &  Christal 
(I96I)  describe  an  application  of  this  concept  where  the  function  is  predictive  efficiency  and 
the  objective  is  to  minimize  loss  of  predictive  efficiency  as  the  grouping  proceeds.  Christal 
(1963)  discusses  application  of  the  predictive  efficiency  function  to  the  development  cf  on-the- 
job  criterion  composites  using  simulated  job  incumbents,  to  job  evaluation  using*  simulated 
jobs,  and  to  identifying  a  homogeneous  policy  among  selection  hoard  members.  In  discussing 
these  applications,  the  method  is  referred  to  as  JAN  for  judgment  analysis. . 

In  Christal ’s  discussion  of  the  job  evaluation  application,  it  is  suggested  that  jobs  may 
be  simulated  by  use  of  ratings  on  a  series  of  job  evaluation  factors.  Narrative  description  is 
to  be  omitted  and  the  judge  is  asked  to  make  criterion  decisions  based  on  the  factor  ratings 
only.  In  this  manner,  the  job  is  simulated  and  influences  present  in  job  descriptions  which  tend 
to  distort  judgment,  such  as  prestige  value,  are  eliminated. 

After  criterion  decisions,  such  as  a  rank  ordering  of  a  set  of  simulated  jobs,  have  Seen 
obtained,  the  first  step  in  the  analysis  procedure  is  to  compute  a  least  squares  solution  of  a 
multiple  regression  equation  to  predict  the  criterion  decisions  given  by  each  rater,  using  the 
factor  ratings  as  predictors.  Using  the  R  ^  computed  for  each  individual,  unacceptable  raters 
may  be  eliminated  by  comparing  the  R^s  computed  from  their  equations  with  the  s  obtained 
for  the  other  judges  in  the  sample.  Next,  a  single  value  of  is  computed  to  indicate  the  over¬ 
all  predictive  efficiency  when  all  the  individual  rater  equations  are  considered.  Then  every 
individual  equation  is  compared  with  all  others,  and  the  two  raters  who  have  the  most  homo¬ 
geneous  equations  are  located.  The  computer  prints  the  single  equation  that  best  represents 
the  joint  policy  of  these  two  judges  as  well  as  the  loss  in  overall  predictive  efficiency  that 
results  when  the  N  original  equations  are  reduced  to  N-l  equations.  Subsequently,  the  proce¬ 
dure  systematically  reduces  the  number  of  raters  or  rater  clusters  by  one  at  each  step  until  all 
raters  have  been  grouped  into  a  single  cluster.  At  each  step,  examination  of  the  loss  in  over¬ 
all  predictive  efficiency  (the  reduction  in  R^)  makes  it  possible  to  identify  the  different  policies 
which  may  exist  in  the  sample. 

The  application  of  JAN  to  the  resolution  of  board  or  group  disagreement  described  by 
Christal  is  a  special  case  of  policy  analysis.  If  the  policies  of  a  group  of  judges  are  not  homo¬ 
geneous,  as  expressed  by  regression  equations,  the  source  of  disagreement  in  terms  of  specific 
factors  can  be  located.  Arbitration  then  can  be  efficiently  directed  to  the  source  of  heteroge¬ 
neity  of  policy.  Funhermore,  the  nature  of  the  policy  structure  in  the  group  can  be  identified. 
There  may  be  two  distinct  and  clear-cut  policies  which  divide  the  group  into  two  parts  or  there 
may  be  many  separate  policies  with  widespread  disagreement  distributed  among  all  the  factors 
being  used.  Such  a  policy  analysis  is  not  only  pertinent  as  a  measure  of  interrater  agreement 
but  it  identifies  the  source  and  extent  of  disagreement. 

Several  desirable  attributes  of  this  procedure  are  described  in  Christal *s  paper.  One  is 
that  the  rater  is  unable  to  display  any  halo  effect  since  he  is  judging  job  characteristics  and 
not  the  job  Itself.  Another  is  that  the  /?2  computed  for  each  rater  is  an  evaluation  of  his 
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consistency.  ^  Further,  it  is  of  considerable  value  to  be  able 
to  identify  the  homogeneiwy  of  the  policies  of  the  judges.  If 
very  little  predictive  efficiency  is  lost  by  grouping,  we  have 
a  strong  indication  that  all  judges  in  the  sample  tend  to  use 
essentially  the  same  policy.  In  this  respect,  an  expression  of 
the  degree  of  interrater  agreement  is  provided. 


PURPOSE 

In  the  development  of  a  job  evaluation  plan  for  the  Air 
Force,  one  of  the  critical  tasks  is  assuring  that  interrater 
agreement  is  high  in  the  population  which  makes  the  judg¬ 
ments  which  form  the  basis  of  the  plan.  If  interrater  agree¬ 
ment  is  measured  in  terms  of  homogeneity  of  policy,  the  addi¬ 
tional  information  concerning  the  policy  structure  in  a  sample 
will  also  describe  the  nature  of  the  agreement  or  disagree¬ 
ment.  It  is  also  desirable  to,  identify  a  method  for  assessing 
the  consistency  of  judgment  for  any  particular  judge.  The  pur¬ 
pose  of  this  paper  is  to  report  an  application  of  JAN  to  the  es¬ 
timation  of  homogeneity  of  policy  and  individual  rater  relia¬ 
bility. 


METHOD 

A  different  simulated  job  was  printed  on  each  of  50 
cards.  Samples  of  judges  then  ranked  all  50  cards  on  either 
merited  pay  or  merited  grade. 


Simulation  of  Jobs 

A  sample  of  50  officer  job  descriptions  was  selected 
from  a  group  of  144  job  descriptions  which  were  representa¬ 
tive  of  Air  Force  officer  jobs  and  available  from  a  previous 
study  Madden,  (1963a).  Two  psychologists  experienced  in 
job  evaluation  rated  each  of  the  50  jobs  on  10  factors.  Dif¬ 
ferences  were  then  arbiirated  until  a  single  value  for  each 
job  on  each  of  the  10  factors  was  agreed  upon.  For  each 
job,  the  10  factor  names  and  the  corresponding  factor  values 
were  printed  on  a  card  as  shown  in  Figure  1.  The  officers 
selected  to  rank-order  the  simulated  jobs  were  informed 
that  each  card  represented  an  officer  Job  and  that  the  rat¬ 
ings  given  on  the  10  factors  were  typical  of  the  ratings 
ordinarily  assigned  to  the  job. 


*This  might  be  untrue  for  the  unlikely  situntion  in  which  the  judge  has  taken  into 
account  interactions  among  the  predictors  or  nonlinear  relationships  between  one  or  more 
of  the  predictors  and  the  criterion  which  have  not  been  included  in  the  model. 
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Fig.  1.  Sample  card  used  to 
simulate  a  job. 
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Ranking  the  50  Simulated  Jobs 

Captains  and  majors  who  were  stud¬ 
ents  in  the  61-62  class  at  Air  Force  Com¬ 
mand  and  Staff  College  served  as  judges. 
Merited  pay  was  the  basis  of  ranking  for 
38  judges,  and  merited  grade  was  the  basis 
of  ranking  for  36  judges.  After  each  judge 
had  ranked  the  50  simulated  jobs,  he  wrote 
the  rank  number,  from  1-50,  in  the  lower 
right  corner  (labelled  ’’RANK  NR”  in  Fig¬ 
ure  1)  of  each  of  the  50  cards, 

RESULTS 

The  policy  of  each  rater  was  expressed 
in  a  regression  equation  predicting  his  rank- 
ordering  of  the  jobs  from  the  factor  values. 
Table  1  gives  the  distribution  of  for  the 
36  judges  who  ranked  the  simulated  jobs  in 
terms  of  merited  grade  and  the  38  judges 
who  ranked  them  in  terms  of  merited  pay. 

The  R^s  of  .15  and  .31  showed  these  two 
Judges  to  be  so  inconsistent  as  to  disqualify 
them  as  judges.  Hence  they  were  not  in¬ 
cluded  in  the  grouping  computations. 

The  results  of  the  grouping  procedures 
are  summarized  in  Table  2  for  the  two  groups 
of  judges,  separately  and  combined.  As  the 
number  of  groups  was  reduced  to  three 
decreased  from  .93  to  .88  for  merited  pay 
judgments,  from  ,92  to  .88  for  merited  grade, 
and  from  .93  to  .86  for  the  combined  sets, 

The  final  R^s  for  one  group,  were  .78,  ,84 
and  .81  respectively. 

DISCUSSION 

That  R^s  were  over  .70  for  all  except 
two  cases  indicates  a  high  level  of  predic¬ 
tability  of  the  rank  ordering  of  the  50  simu¬ 
lated  jobs  from  knowledge  of  factor  scores. 
The  officers  in  these  samples  are  able  to 
utilize  the  scores  on  10  job  evaluation 
factors  in  a  consistent  manner  so  that  the 
rank  ordering  of  50  patterns  0/  10  scores 
is  predictable  to  the  high  degree  indicated 
in  Table  1.  It  may  be  concluded  that  these 


Table  1.  Distribution  of  Predicting 
Rank  Order  From  Factor  Values 
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Table  2.  Effect  on  R  of  Number  of  Groups 


Basis  of 
Ranking 

No-  of 
Groups 

No,  of  Judges 
in  Each  Group 

R2 

Merited  Pay 

38 

1 

.93 

(N  -  38) 

3 

33,  4,  1 

.88 

2 

37,  1 

.87 

1 

38 

.78 

Merited  Grade 

34 

1 

.92 

(N  =  34) 

3 

16,  16,  2 

.88 

2 

32,  2 

.86 

1 

34 

.84 

Merited  Grade 

72 

1 

.93 

&  Merited  Pay  3 

64,  5,  3 

.86 

Combined 

2 

67,  5 

.83 

(N  =  72) 

1 

72 

.81 
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officers  can  make  reliable  judgments  concerning  simulated  jobs,  and,  further,  that  judges  who 
are  inconsistent  as  evidenced  by  low  R^s  may  be  identified  and  eliminated  on  the  basis  of  cheir 
performance  as  judges. 

Although  racers  may  be  consistent  in  terms  of  the  application  of  their  own  policies,  they 
may  be  in  marked  disagreement  with  other  judges  who  are  themselves  consistent.  If  the  rank¬ 
ings  for  /V  judges  are  treated  as  a  single  variable  and  a  single  computed,  the  for  the  N 
judges  is  a  measure  of  the  homogeneity  of  the  N  equations  and  indicates  the  amount  of  agree¬ 
ment  among  the  N  policies  (Bottenberg  ^  Chriscal,  1961). 

Table  2  may  be  interpreted  in  terms  of  the  agreement  among  policies  at  various  stages  of 
grouping  for  each  of  the  two  experimental  conditions.  Generally,  the  greatest  loss  occurs  toward 
the  end  of  the  grouping  procedure  or,  in  other  words,  the  rate  of  loss  of  predictive  efficiency  in¬ 
creases  with  each  step.  Interpretation,  then,  may  be  made  in  two  ways:  (a)  the  value  of  for 
the  final  step  when  all  cases  are  in  a  single  group;  and  (b)  the  value  of  R^  for  a  particular  step 
viewed  in  terms  of  the  number  of  groups  and  the  number  of  cases  in  each.  For  the  two  samples 
in  this  study,  policy  is  more  homogeneous  when  the  basis  of  ranking  is  merited  grade  (R^  =  .84) 
than  when  ranking  is  based  on  merited  pay  (R  ^=.78),  and  intermediate  (R^-.81)  when  both  sam¬ 
ples  are  combined.  All  of  these  values,  however,  reflect  an  adequate  level  of  interrater  agree¬ 
ment,  Looking  at  specific  steps,  this  appearance  is  strengthened.  In  the  merited  pay  sample, 
for  instance,  a  great  deal  is  lost  on  the  last  seep  when  one  case  is  added  to  a  group  of  37  cases 
and  R^  drops  from  ,87  to  .78. 

The  pattern  of  grouping  for  merited  grade  is  of  special  interest.  At  the  3-group  stage  there 
are  two  groups  of  16  cases  each  and  one  group  consisting  of  2  cases.  The  two  large  groups 
might  appear  at  first  glance  to  represent  two  different  homogeneous  policies,  but  when  these  two 
groups  are  combined  into  a  single  group  on  the  next  step,  the  drop  in  is  only  .02. 

It  appears  from  these  results  that  policy  regarding  merited  pay  and  merited  grade  are  homo¬ 
geneous  among  the  officers  in  these  two  samples  when  judgments  are  based  on  simulated  jobs. 
Previous  studies  (Madden,  1963a,  1963b)  have  indicated  that  jobs  are  ranked  differently  in  terms 
of  merited  pay  than  when  merited  grade  is  the  basis  of  ranking,  and  that  the  formula  which  best 
predicts  merited  pay  is  less  predictive  of  merited  grade  and  vice  versa. 

It  might  be  hypothesized  that  the  definition  of  rating  factors  tends  to  be  supplemented  by 
the  object  being  rated  and  that  when  the  object  is  simulated,  this  supplementary  definition  is 
not  present.  It  seems  then,  that  a  study  utilizing  two  conditions,  one  where  the  object  itself  is 
rated  and  one  where  it  is  simulated,  would  yield  some  insight  into  the  nature  of  this  supplemen¬ 
tary  definition  which  is  provided  by  the  object  rated.  In  job  evaluation,  much  could  oe  learned 
about  prestige  or  glamour  effects. 
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